feat(webui): upload images/PDFs from the chat composer by yaozheng-fang · Pull Request #579 · volcengine/veadk-python

yaozheng-fang · 2026-06-03T09:17:17Z

What

The web UI composer's + button was inert. It now opens a small upload menu
(上传图片 / 上传文件 (PDF), ChatGPT-style rounded card) and lets users
attach images and PDFs to a chat turn, aligned with the existing
/run_sse endpoint.

How

Frontend (frontend/src/)

+ toggles a popover with two upload actions backed by hidden
<input type="file"> (image/* and application/pdf).
Files are read to base64 and sent as inline_data parts on
new_message — runSSE now prepends attachment parts before the text part.
Pending attachments render as rounded image thumbnails / PDF chips (with an ×
to remove); sent turns render the same, and history is reconstructed from
inline_data parts so reloaded sessions show them.
Per-file cap (~20 MB) with an error message.
Rebuilt veadk/webui (committed).

Backend

Images already reach the model via ADK's LiteLlm image_url data-URI path
— no change needed.
New veadk/utils/pdf_to_images.py: a before_model_callback that renders
each application/pdf part to one image/png part per page via pypdfium2,
so a vision-capable model reads it (effectively OCR; scanned PDFs included).
Page count is capped (default 10) to bound token cost; pypdfium2/pillow are
lazy-imported with a clear "install veadk-python[pdf]" error.
Added a [pdf] extra (pypdfium2 + pillow, both permissive licenses — not
PyMuPDF/AGPL).
Wired onto the a2ui_agent and basic-app demo agents; basic-app installs
[a2ui,pdf] and its README documents the attachments feature + vision-model
requirement.

Verification

tests/test_pdf_to_images.py (3 tests): PDF part replaced by one image/png
per page, original text preserved; max_pages respected; non-PDF requests
untouched. All pass.
Pyright + Ruff clean on the new Python; npm run build (tsc + vite) succeeds;
markdownlint clean on the edited READMEs.

Notes

The agent model must be vision-capable (default doubao-seed-1.6 is);
non-vision models can't consume images or PDF-as-images.
Out of scope (follow-up): literal PDF→text extraction, drag-&-drop,
paste-to-upload.

The composer "+" button now opens an upload menu (上传图片 / 上传文件 PDF). Selected files are read to base64 and sent as inline_data parts on the existing /run_sse new_message, shown as rounded thumbnails / PDF chips both while pending and on the sent user turn (also reconstructed from history). Images already reach the model via ADK's LiteLlm image_url path. PDFs are handled by a new before_model_callback (veadk.utils.pdf_to_images) that renders each page to an image/png part with pypdfium2 so a vision-capable model can read them (effectively OCR, scanned PDFs included). Page count is capped (default 10) to bound token cost. Wired onto the a2ui_agent and basic-app demo agents; added a [pdf] extra (pypdfium2 + pillow).

Two fixes found while testing image/PDF upload: - History images showed only the filename instead of the picture. ADK serialises inline_data bytes as URL-safe base64 (-_), but a data: URI needs standard base64 (+/), so the reloaded <img> failed and fell back to its alt text. Normalise base64url -> base64 in attachmentsFromParts. - A failed turn (e.g. a non-vision model rejecting an image) is delivered as a `data: {"error": ...}` SSE frame, which the client ignored — the turn just rendered nothing. Surface it as the error banner and stop the stream.

yaozheng-fang added 2 commits June 3, 2026 17:15

zakahan approved these changes Jun 4, 2026

View reviewed changes

yaozheng-fang merged commit 611e1a9 into main Jun 4, 2026
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(webui): upload images/PDFs from the chat composer#579

feat(webui): upload images/PDFs from the chat composer#579
yaozheng-fang merged 2 commits into
mainfrom
feat/composer-file-upload

yaozheng-fang commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yaozheng-fang commented Jun 3, 2026

What

How

Verification

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants